accelerated stochastic greedy coordinate descent
Accelerated Stochastic Greedy Coordinate Descent by Soft Thresholding Projection onto Simplex
In this paper we study the well-known greedy coordinate descent (GCD) algorithm to solve $\ell_1$-regularized problems and improve GCD by the two popular strategies: Nesterov's acceleration and stochastic optimization. Firstly, we propose a new rule for greedy selection based on an $\ell_1$-norm square approximation which is nontrivial to solve but convex; then an efficient algorithm called ``SOft ThreshOlding PrOjection (SOTOPO)'' is proposed to exactly solve the $\ell_1$-regularized $\ell_1$-norm square approximation problem, which is induced by the new rule. Based on the new rule and the SOTOPO algorithm, the Nesterov's acceleration and stochastic optimization strategies are then successfully applied to the GCD algorithm. The resulted algorithm called accelerated stochastic greedy coordinate descent (ASGCD) has the optimal convergence rate $O(\sqrt{1/\epsilon})$; meanwhile, it reduces the iteration complexity of greedy selection up to a factor of sample size. Both theoretically and empirically, we show that ASGCD has better performance for high-dimensional and dense problems with sparse solution.
Reviews: Accelerated Stochastic Greedy Coordinate Descent by Soft Thresholding Projection onto Simplex
Paper Summary: The main idea is that Nesterov's acceleration method's and Stochastic Gradient Descent's (SGD) advantages are used to solve sparse and dense optimization problems with high-dimensions by using an improved GCD (Greedy Coordinate Descent) algorithm. First, by using a greedy rule, an l_1 -square-regularized approximate optimization problem (find a solution close to x * within a neighborhood \epsilon) can be reformulated as a convex but non-trivial to solve problem. Then, the same problem is solved as an exact problem by using the SOTOPO algorithm. Finally, the solution is improved by using both the convergence rate advantage of Nesterov's method and the "reduced-by-one-sample" complexity of SGD. The resulted algorithm is an improved GCD (ASGCD Accelerated Stochastic Greedy Coordinate Descent) with a convergence rate of O(\sqrt{1/\epsilon}) and complexity reduced-by-one-sample compared to the vanilla GCD.